Algorithms for Large, Sparse Network Alignment Problems 
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Abstract 

We propose a new distributed algorithm for sparse vari- 
ants of the network alignment problem that occurs in a 
variety of data mining areas including systems biology, 
database matching, and computer vision. Our algorithm 
uses a belief propagation heuristic and provides near op- 
timal solutions for an NP-hard combinatorial optimization 
problem. We show that our algorithm is faster and outper- 
forms or nearly ties existing algorithms on synthetic prob- 
lems, a problem in bioinformatics, and a problem in on- 
tology matching. We also provide a unified framework for 
studying and comparing all network alignment solvers. 

1. Introduction 

The focus of network alignment problem is to find ap- 
proximate isomorphisms, or alignments, between similar 
graphs. It generalizes a classic problem in graph theory 
called the maximum common subgraph problem |8|. Fur- 
thermore, network alignment is widely applied to problems 
in bioinformatics 11111181 . database schema matching fl2l . 
ontology matching |23], and computer vision |3][T7), where 
it is also known as the weighted graph matching problem. 

Recently, many algorithms have been introduced to find 
approximate solutions for this problem with large graphs — 
around 10,000 vertices — where exact solutions are infea- 
sible. Despite the abundance of algorithms, little research 
has been done about the advantages and limitations of each 
algorithm, nor do we know how good they are in practice 
when the input graphs are even bigger — around 100,000 
vertices. In this paper we propose a new approach based 



Ying Wang 
Stanford University 
Inst, for Computation and 
Mathematical Engineering 
Stanford, CA 94305 
y w 1 984 @ stanford.edu 



on belief propagation that performs very well in these large 
instances. We also provide a survey of other algorithms and 
compare all algorithms on both real and synthetic data. 

1.1. Network alignment 

Consider two graphs A = (Va,Ea), B — (Vb,Eb) 
with vertex sets Va — {1>2, and Vb = 

{1', 2', . . . , to'} and let L be a bipartite graph between the 
vertices of A and B, formally L = (VaUVb, El). The goal 
is to find a matching between vertices in A and B where all 
possible matches must be from the edges of L. A matching 
in L is a subset of El such that no two edges share a com- 
mon endpoint. Let M be such a matching. For a matching 
M, we say that an edge (*,«') £ M forms a square with 
another edge (j,f) 6 M if (i,j) 6 E A and (i',f) G E B . 
In such situation we also say that the edges (i,j) <E Ea and 
(i',f) £ Eb are overlapped. The problem setup and the 
definition of squares are illustrated in Figure Q] 

Definition 1. Given A, B, and L as above, the overlap 
graph matching problem is to find a matching M that max- 
imizes the total number of squares (overlapped edges). 

A special case of this problem is the well-known max- 
imum common subgraph problem [ 8 1 when L is the com- 
plete bipartite graph and thus, it is AP-hard. In fact, it is 
also AP-hard to approximate it with an approximation ra- 
tio better than MaxCut [9|. In other words, assuming the 
unique games conjecture is correct, it is NP-hard to approx- 
imate the maximum overlap within 7 + e of the optimal 
solution where 7 « 0.878. 

When each edge (i, i') of L has a non-negative weight 
wu> , often the problem is generalized to finding a matching 




Figure 1 . The setup for the network alignment prob- 
lem. The goal is to maximize the number of squares 
in any matching while maximizing the weight of the 
matching as well. 



that is large in both total weight and the number of squares. 
This generalized version is called the network alignment 
problem and is formally defined in Section |2~Tl 

Since this is an NP-hard problem, several heuristic ap- 
proaches have been developed to address it in practice. 

1.2. Our contribution 

Most of the existing literature on the problem (except 
Klau [11]) assume that L is the complete bipartite graph, 
i.e. every pair of vertices between A and B is a valid match, 
although . We explicitly formulate the problem when there 
is only a sparse set of possible matches between the vertices 
of A and B. This formulation easily handles sparse graphs 
with over 200,000 vertices and also supports many existing 
algorithms. Under this formulation, we obtain the following 
main contributions. 

1. Our analysis provides the first unified framework for 
studying all algorithms for the problem. (Section[2]i. 

2. We propose a new distributed and efficient algorithm 
for the problem based on max-product belief propaga- 
tion (Section[3]l. 

3. Using two synthetic matching problems, we compare 
belief propagation and existing algorithms (Section|4]i 

4. We also compare all algorithms on two bioinformat- 
ics problems, and a large ontology alignment problem 
(Section|5]i. 

From these experiments we conclude that our belief prop- 
agation algorithm is fast, robust, and yields near-optimal 
objective values in all tests. Our algorithm outperforms 
or nearly ties with the best existing algorithm (due to 
Klau [11]) in terms of quality of the solution on sparse 
graphs. On dense graphs, it outperforms Klau considerably. 

Therefore, we see great potential for our approach 
in future network alignment applications. All of our 
algorithms are implemented in Matlab are available from 

http : //www. Stanford. edu/~dgleich/publications/20 9/netalign 



1.3. Related Work 

The network alignment problem has a rich and evolving 
literature. It has been studied within the context of database 
schema matching lfl2l . protein interaction alignment |fl8l 
|T9l , ontology matching 11231 . and pattern recognition 0. 

The maximum common subgraph problem is the oldest 
instance of related literature [8|. Most of the algorithms 
proposed for this problem seek an exact solution and require 
exponential time [4[. 

Other heuristics for the network alignment problem in- 
clude the IsoRank algorithm fl8l . which uses the PageRank 
idea to identify a good solution; the closely related similar- 
ity flooding algorithm |[T2l . which includes a sparse align- 
ment objective in a considerably different form, and an SDP 
relaxation [17]. Finally, Klau [11] formulates the problem 
with an equivalent quadratic program and proposes a series 
of linear programming relaxations. 

Among these algorithms only IsoRank (or similarity 
flooding, which is very similar), and Klau's algorithm are 
fast enough to handle large graphs. Therefore we focus on 
only these fast algorithms. 

2 Problem Formulation 

In this section we investigate a QP formulation of the 
problem and several relaxations. 

2.1 Quadratic Program Formulation 

First we introduced the short notation ii' instead of (i, i') 
for an edge of L. Then for each edge ii' 6 El, we assign 
a binary variable xu> that is 1 or depending on if ii' is in 
the matching M or not. The total number of squares (over- 
lapped edges) can be written in the form (l/2)x T S : x, where 
x is the \El | x 1 vector of all xu> 's and S is a 0, 1 matrix of 
size \El I x \El | such that SW jy = 1 if and only if the cor- 
responding edges ii' and jj' of L form a square (contribute 
an overlap). We construct both x and S according to a fixed 
ordering of the edges of L. 

The vector x should also induce a valid matching, so for 
any vertex of L, the sum of xu> 's on the edges incident to it 
cannot exceed 1. In particular, let A be the | Vl | x \El | un- 
oriented incidence matrix of graph L, then Ax < 1 gives 
the matching constraint. Here 1 = l n + m 6 R("+ m ) i < 1 j s 
the vector of all ones. Using these definitions, the network 
alignment problem is an integer quadratic program (QP) 



maximize aw T x + ((3/2)x T Sx 

X 

subject to Ax < 1, xw € {0, 1} 



(NAQP) 



where a, (3 are arbitrarily chosen nonnegative constants to 
address the trade-off between the two objective functions 
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Figure 2. A small sample problem and the data for the QP formulation. 



(similarity and number of squares). When a — and /3 = 1 
then the program solves the overlap graph matching prob- 
lem. Figure 12 shows an example of the network alignment 
problem and an explicit construction of the matrices S and 
A. 

When L is the complete bipartite graph between A and 
B, then we write the problem slightly differently to take 
advantage of the additional structure. Fix an ordering of 
the vertices in A and B and let A and B be the respective 
adjacency matrices. Let X be a n X m zero-one matrix 
where Xi^ = 1 if and only if the edge ii' is included in 
the matching. Let W be the matrix of weights on edges 
of L, for kk' e El then Wk,k> = wuk 1 , and let • be the 
inner-product operator between matrices. 

maximize aX • W + {(3/2)AXB T • X 
subject to Xl m < l n ,X T l n < l m (NAFQP) 

Xi,i> ={0,1} 

Throughout the rest of the paper we drop the indices from 
the vector of ones 1 when its dimension is clear from the 
context. This formulation is identical to ( |NAQP| i with S = 
B (g) A and x = vcc(X) after using the relationships be- 
tween the Kronecker product and "vec" operator [ 10 Sec- 
tion 4.6]. 

2.2. IsoRank Algorithm 

In this section, we present the IsoRank algorithm ifTSI 
and a slight modification for our sparse problems which we 
call SpalsoRank. 

The main idea of IsoRank algorithm is to approximate 
the objective of ( |NAFQP| i without direct concern for the 
matching constraints. If a = and we drop all the con- 
straints, this optimization is an eigenvector problem. To be 
a bit more democratic towards the matching objective (a ^ 



0) and the actual matching constraints, they instead formu- 
late the optimization as a PageRank problem 1151 . The in- 
tuition is that PageRank computes a real-valued heuristic 
Z where zw is based on average weight of all neighboring 
matches Zjji where S Ea and £ Eb and the 

actual matching weight ww . With this heuristic real-valued 
solution Z, they compute a binary solution X by solving a 
maximum weight matching problem where the weights are 
Z. 

IsoRank 

Input: A, B, W, damping parameters 7, niter, e 

1 V = W/(1 T W1) 

2 d A = A1,P = dmg[d A ]- 1 A 

3 d B = Bl, Q = diagfds] ^ 

4 W (0) = V 

5 for k = 1 to niter unless S < e 

6 WW =7P T VF( fc+1 )Q + (l-7)V 

7 S = \\W k - w^W 

8 end 

While the previous formulation describes the algorithm 
from ref. IfTSI . we can convert the algorithm to solve our 
sparse version of the network alignment objective. Due to 
space limitations we provide this modified algorithm that is 
denoted by SpalsoRank in Appendix l7.ll 

2.3. Linear Program Formulations 

As stated in the previous section, the network alignment 
problem is an NP-hard integer quadratic program. Find- 
ing the global maximum of that problem is hard even af- 
ter relaxing the integer constraints, because S is indefinite 
it then leads to a non-convex quadratic program. Conse- 
quently, we seek other relaxations that will yield efficient 
algorithms. The following relaxations were originally de- 



scribed by Klau [11], although we repeat them in our sparse- 
alignment formulation. 

For simplicity we often use ii'Ojj' when ii' and jj' 
form a square. Therefore, ii'Ojj' <^> Su/jj/ = 1. In 
our first relaxation, we convert ( |NAQP[ ) into a mixed in- 
teger program x T Sx = J2u'Ojf x U' x jo' • We replace 
each xu'Xjji with a new variable ywjj', but constrain 
Vii',33' < Xm>> an d Vii'jj' < Xjji . These constraints en- 
force yu'jj' < Xn'Xjf when Xw and Xjf are binary. We 
also add symmetry constraints ywjj' = Vjj',w- Notice that 
with the symmetry constraints we can drop the constraints 

Vii',33' — x n' ■ 

Before writing the new integer program, let us define 

Ys to be a matrix with the same dimension as S where 
Ys(H',jj') = Vu'.jj' when S(ii',jj') = 1 and is other- 
wise. Thus, 

maximize aw T x + f Y^wajf Vwjj' 

subject to Ax<l, !«/ G {0, 1}, (NAILP) 

Vii',33' < xn' V ii'Ojj' ', 
Ys = Y£ 

is a binary linear program to solve the network alignment 
problem. In contrast with the quadratic program, we can 
relax the binary constraint on ( INAILPb and get an efficient 
algorithm. Writing Y,wujj' Va'df as s * Y S> the re- 
laxed program is 



maximize aw T x + • Ys 
x,y z 

subject to Ax < 1, x e e [0, 1], 

Vii',33' < V ii'Ojj', 
Ys = YT 



(NARLP) 



and admits a polynomial-time solution with any linear pro- 
gram solver. Having this relaxation is advantageous be- 
cause it yields an upper bound on the objective value of 
the network alignment problem. Further, solving (INAILPb 
with a = 0, j3 = 1 allows us to get an upper bound on 
the maximum possible overlap between two networks. As 
with IsoRank, generating a discrete solution is just solving 
a maximum weight matching problem with the fractional 
solution weighting each edge. 

2.4. Klau's Linear Program Formulation 

Recently, Klau ifTTl proposed a further relaxation for 
(INARLPb based on the idea of Lagrangian decomposition 
for mathematical programming. The observation is that we 
can drop all the symmetry constraints by adding penalty 
terms of the form Uw jf (yu>,jj' — Vjj',w) to the objective 
function where u^^y's are Lagrange multipliers. Follow- 
ing this idea, we arrive at 



maximize aw x 

subject to Ax < 1, 

< x. 



§S*Y S + U S *(Y S -Y ( T 



IV ,33' 



xw G [0, 1] 
V ii'Ojj'. 



s ) 
(NALLP) 



It is clear that when Ys = Yg the two linear programs 
(INARLPb and (INALLPb are equivalent. Therefore, for any 
fixed Us the optimum solution of (INALLPb is an upper 
bound for the objective of dNARLPt which is itself an up- 
per bound for the network alignment problem. It is a well- 
known result in optimization theory that with optimal La- 
grange multipliers the two LP's have the same optimum. 
Moreover, Klau shows that for any fixed Us, ( INALLPb 
has an integer optimum solution that can be found us- 
ing a maximum weight matching (MWM) algorithm. The 
idea is to assign each variable yu'jj' (yjj>,u>) to the vari- 
able xu' (xjf) and formulate the objective as w' T x where 

aWu +max y Y^U'Ujj' Vii',33' (f + U U',33' ~ U 33',H' )■ 



w n = 
When 



+ u v 



> 0, it simplifies to w' u 



OiWu + J2ti>ajj'( 2 + U iV ,33' ~ U 33',H')- 

While these relaxations give upper bounds on the objec- 
tive, there is often a large gap between the fractional upper 
bound and the integer solution. Klau proposes a nice so- 
lution to reduce this gap. Notice that in both (INAILPb and 
(INARLPb 



E 



Vif ,33 
3 3 



J2- 



< i 



(i) 



for any fixed ii' and j', and similarly the inequality holds 
when j is fixed and the summation is over j' . This means 
that row ii' of Ys (denoted by Ys (ii', :)) should satisfy the 
matching constraint A(Ys(ii' , :)) < 1. However, when 
the symmetry constraints are removed {Ys ^ Yg), in- 
equalities (fl~|i may be violated. Klau adds these constraints 
to obtain a tighter LP. 

maximize aw T x + f S • Y s + U s • (Ys - Yg) 



subject to Ax < 1, xw € [0,1], 
Vii',33' < Xti> V ii'Ojj', 
A(Ys(ii',:)) T < 1 Vit'. 



(NATLP) 



This LP can still be solved using a MWM algorithm, and the 

term max y J^wajj' 2/«',w'(f + u **',jj' ~ u 3j',a') in each 
weight w' u , becomes the solution of a smaller MWM prob- 
lem 



1 T 



§l T + t/ s (M',:)-C/J(M',:) 



maximize Ysiii',:) 

Y 3 (ii',:) 

subject to A(Y 3 (ii', :)) 2 <1 

Klau's final algorithm is an iterative procedure that for each 
fixed Us solves \El\ small matchings like the one above 
and one large matching problem. Then it updates Us using 
a sub-gradient method and repeats. Because each iteration 
of this algorithm calls many MWM functions, we call this 
algorithm the matching relaxation or MR for short. 

We need two quick short-hands to compactly write the 
algorithm. First, define bound a ,b z = min(6, max(a, z)), 
and let both bound a h x and bound a j, A be defined element- 
wise. 



Second, define d, Sl = maxrowmatch(S\ L), where 
each entry in d is the result of a MWM on all the other 
edges in that row of S, with weights from the corre- 
sponding entries of S. Written formally, — bipar- 
tite _match({(j, f) : Sw t jj')}). The matrix Sl has a 1 for 
any edge used in the optimal solution of the MWM problem 
for a row. 

NetAlignMR 

Input: S, w, damping parameters 7, niter, 

1 LT(°) =0 

2 for k = 1 to niter 

3 d, S L = maxrowmateh((/3/2)S + U -U T ,L) 

4 \y( fe ) = aw + d 

5 x( fc ) = bipartite _match(L,w( fc )) 

e F = U^-V - ~/X( k hriu{S L ) + ~/tril{S L ) T X^ 
1 U {k) = bound F 

-0.5 : 0.5 

8 end 

Here triu(Si) (tri^Sz,)) represents the upper (lower) tri- 
angular part of Sl- There is no need to round these solu- 
tions as x' fe ' is always a binary solution. 

3. A belief propagation algorithm 

In this section we introduce a distributed algorithm that 
is inspired by a recent result on solving the maximum 
weight matching problem using the max-product version of 
belief propagation (BP) [ 1 1. For simplicity, we refer to this 
algorithm by BP. We will show in Sections |U|5] that BP al- 
gorithm always produces better results than IsoRank, and 
compared to NetAlignMR, BP is either better or its solu- 
tions are very close. But, BP has a better running time than 
NetAlignMR and IsoRank. 

The goal of the BP algorithm that we develop here is to 
solve the integer program ( |NAQP| i directly. The main intu- 
ition behind this algorithm (and, indeed, all BP algorithms) 
is that each vertex of the graph assumes the graph has no 
cycles, and makes the best (greedy) decision based on this 
assumption. 

Recently, BP and its variations have been very successful 
for the solution of random constraint satisfaction problems 
iTOI . It has also been known for many years in the context 
of coding theory [7|, artificial intelligence [16|, and com- 
puter vision ll20l . Recent applications can also be found in 
systems biology |24| and data clustering |6|. 

Previously, BP was used to find a maximum weight 
matching in bipartite graphs [1|, and its convergence and 
correctness was rigorously proved. These results show that 
a BP algorithm correctly solves the case of (3 = 0. Here we 
modify that algorithm to accommodate the quadratic term 
(J3 > 0), which involves all squares between the pairs of 
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Figure 3. The graph (b) is the factor-graph represen- 
tation of the problem in (a). 



edges in El, as well. It is a distributed algorithm that runs 
by passing messages along the edges of graph L and also 
along the squares. 

Next, we will define a Markov random field (MRF) on 
a suitable factor-graph where the maximum a posteriori as- 
signment (MAP) corresponds to the optimum integer solu- 
tion of ( |NAQP| i. Then we use the max-product version of 
BP to find an approximate solution to this MAP assignment. 
Our final BP algorithm will be a simplified version the max- 
product algorithm. We also present a matrix formulation of 
this simplified version in Section [331 that is useful for im- 
plementations of the algorithm. 

3.1. Factor-graph representation F(A,B,L) 

In this section we use a standard factor-graph represen- 
tation C6) for ( |NAQP| >. Recall that V A = {1, 2, . . . , n} 
and Vb = {1,2, ...,m}. For any square formed by 
the two edges ii' and jj' of El, create a new vertex 
(ii'jj 1 ), and denote the set of all such vertices by Vs = 
\ ii'Ojj'} ■ Now, define a factor-graph with vari- 
able nodes (x£ L ,xs) 6 {0, Ij-I^l+I^s^ anc [ t wo types of 
function nodes for enforcing the integer constraints. For 
each vertex i e Va (j £ Vb) define a function node 
fi([xa>]i<) {gv ([xu']i)). Also for each {ii'jj') € Vs de- 
fine a function node hu/jji ([xu> , Xjj> , xwjji]\ by 

fi([xu']i) = l{E,'^ii'<i}' 9i'([xw]i) = l{E,^, s '<i}' 

hii'jj>([Xii',Xjj',Xwjj']) — l{x ii /j j /=x ii /x j j/}- 

In other words, the function node /j (g^) enforces the 
matching constraint at i (i'), and the function node hwjj' 
guarantees that xwjj' — 1 if and only if xw = Xjji = 1, 

To simplify the notation we use xp for any subset C 6 
El x Vs to denote the vector [Xcjcgc- The set of neigh- 
bors of a node v in a graph G is denoted by Oqv or simply 
by dv. Therefore, xg^, xg 9i , and xg^ M ,.. ; denote [sjj/]^, 



[xii']j and [xw ,Xjj',xwjj'] respectively. Figure [3] shows 
an example of a graph pair A, B and their factor-graph rep- 
resentation as described above. Now define the following 
probability distribution 



p(xEz,> x s) oc e 



aw T x El +£i T 



1 -IIM*™ 



* n^'C* 8 *') n hii'jj'ixdhH,^)- (2) 

i' — 1 ii'jj'GVs 

Note that the exponent aw T Xf; i + ^l T xs is equal to 
aw T x + (p/2)x T Sx where S is the matrix of squares that 
is defined in Section lZTI Moreover, it is simple to see a 1-1 
correspondence between the feasible solutions of ( |NAQP| i 
and the support of the probability distribution (0. The fol- 
lowing lemma formally states this observation. 

Lemma 2. For any (x£ L ,xs) £ {0, lj'^'+l 5 ' with non- 
zero probability, p(xe l ,x$) ^ 0, the vector satisfies 
the constraints of the integer program ( |NAQP| >. Conversely, 
any feasible solution Xg i to ( |NAQP| l has a unique coun- 
terpart {~X-e l j Xg) w/f/z non-zero probability p{x.E L j xg) = 

aw T x+(/3/2)x T Sx 



Moreover, the pair with maximum probability (MAP as- 
signment) is the optimum solution to ( |NAQP| i. 

Lemma 3. The vector (xg t , xj) is fne MAP assignment to 
©, argmax XEi xs p(xg L , Xg), «/an onfy if~x*E L °P" 
timum solution to ( |NAQP| l ant/ xlj is f/ze vector of squares. 

Therefore, we can use max-product version of belief 
propagation (BP) that approximates MAP assignment of (fJJ 
to find an approximate solution to ( |NAQP| i. The follow- 
ing algorithm is a simplified version for the standard max- 
product algorithm on our factor-graph. We discuss the de- 
tails of the derivation of this algorithm from the standard 
version of max-product in Appendix 17. 21 



BP 



INPUT: The weighted bipartite graph L = (Va^Vb,El). 
OUTPUT : A subset of the edges of El that satisfy the con- 
straints of the quadratic programming ( |NAQP| i 1 . 



(1) At times t = 0,1,. 



sages of the form to$ . , and mlt 

It *Ji 



each edge ii' sends two mes- 
and also sends 



one message of the form rn£)^ h ; , for any square 

(m'jJ 1 ) G Vs. All messages of time t — are initial- 
ized by an arbitrary number (let us say 0). 



"'-►Si' 



'it will be described later in this section that the output of the BP al- 
gorithm should be modified in order to satisfy the constrains of the QP 
(NAQP). 



(2) For t > 1, the messages in iteration t are obtained from 
the messages in iteration t—1 recursively. In particular 
for all ii' e E L 



m.-i t = cewu' — max \ m,.. 



E 



ii'jj'£Vs 



P ,„ P 



max(0,^+m(.*7^ ,..,)) • (3) 



The update rule for m--, is similar to the update rule for 



rrr-l t i and 



(*) 

nr.. I 



= ai«ii' — max \ m,., 



max m'17 1 , 



fetVji' 

ii' kk' £ Vcj 

where (a) + means max(a, 0). 

(3) At the end of iteration t each vertex i selects the 
edge ii' that sends the maximum incoming message 
mf) f to it. Denote the set of these selected edges 
by M(t). Now, repeat (2)-(3) until M(t) converges. 



Convergence of BP. In practice M (t) does not necessar- 
ily converge, and in most cases it oscillates between a few 
states. Therefore, we can terminate the algorithm when 
such an oscillation is observed, and use the current mes- 
sages of the edges as weights of a MWM problem to obtain 
an integer solution. Another approach for resolving the os- 
cillation, 1 14, 2, 6|, is to use a damping factor 7 6 [0,1]. Let 
m(i) be the vector of all messages at time t. That is m(i) is 
a fixed ordering of all messages at time t. Then the update 
equations ©-(HI can be rewritten as m(t) = F(m(t — 1)) 
where T is the unique operator that is defined by BP equa- 
tions ©-(Hi. Now one can consider a new operator Q de- 
fined by G(m{t)) = (1 - 7*)m(f - 1) + 7*J ?r (m(i - 1)) and 
update the messages using Q instead of T. The new update 
equations will converge for 7 < 1. 

3.2. The intuition behind BP 

Although, we have explained the standard derivation of 
BP algorithm in Appendix 17.21 in this section we explain 
a direct intuitive explanation of this algorithm for solving 
( |NAQP 1. The algorithm exploits the fact that the constraints 



of ( NAQP| i are local. This means that if each edge of the 



graph L is an agent that can talk to other local agents, then 



they can verify the feasibility of any solution to ( |NAQP[ ). 
They can also calculate the cost of each solution (au> T x + 
§x T £x) locally. 

Based on the above intuition, each agent should com- 
municate to the agents sharing a vertex with him or her 
to control the matching constraints. Messages of the type 



m 



:*) 

»*'—►/< 



and m 



(*) 



are serving that purpose. They also 
contain the information about the weights of the edges (term 
aw T x in the cost function). Similarly, any two agents that 
form a square should communicate, so that they can calcu- 
late the term &x T Sx of the cost function. This information 

is passed by the messages of type m?)^ h . 

When the factor-graph has no cycle (is a forest) remov- 
ing an edge (or agent) and all function nodes connected 
to it, splits the connected components containing the end- 
points of that edge. This means that the optimization prob- 
lem ( |NAQP| i could be solved independently on each com- 



of the form m 



carries the 



ponent. Ihus, a message ui uul, iuim uL U ,^f. 
information about the component that contains %' and there- 
fore is a function of the messages m^*7_^ s , (k ^ i). It also 
contains the information about all squares that contain ii'. 
Ideally, the message mf-j^^ should show the amount of 
change in the cost function (excluding the connected com- 
ponent containing i) by participation of the edge ii' in a 
solution. Similarly, each message of the type mfl h 
should be the change in the cost function by participation of 
jf (restricted to the component the edges j j'). 

Now we give a rough derivation of BP equation (01 using 
the above discussion. Similarly the equation can be ex- 
plained. If ii' is present in the solution, then aww is added 
to the cost function. But none of the edges ki' (k ^ i) 
can now be in the matching. Thus, we should subtract their 



maximum contribution 



,(*-!) 



This ex- 



(maxfc^ 

plains the first two terms in the right hand side of equation 
(0. Moreover, we should add the number of squares that 
will be added by this edge. For each square (ii'jf) £ Vs 
if the edge jf is not present in the matching then noth- 
ing is added. But if it is present then a (3/2 plus the term 
Trifl h should be added. This roughly explains the 

JJ n ii'jj' 

addition of the third term in (|3). 

If the factor-graph F is a forest then one can show by 
induction that the messages and M(t) will converge after 
t > T iterations where T is the length of the longest path 
in F. Moreover the values of each messages will be ex- 
actly the desired value that is explained above (the amount 
of change in the cost, by participation of an edge). 

3.3. A matrix formulation 

For A e K m >" and x E K", define A □ x to be a vector 
in M mx 1 where its r th entry is maxj a r jXj. This operator is 



bound £ (S(*- 1 ) J 



just the regular matrix-vector product but with the summa- 
tion (Ax)i — Y^j a i,j x j replaced by maximization. 

Now we present a matrix formulation of the simpli- 
fied BP algorithm from Section 17.21 We need to split the 
constraint matrix A into [Aj A^] T corresponding to the 
matching constraints in graph A and graph B, respectively. 

NetAlignBP 

Input: A, B, W, damping parameter 7, niter 

1 y(°) =0,z(°) =0,S(°) =0 

2 for t = 1 to niter 

3 F 

4 d = F ■ e 

5 yW = aw - bound 0iOO [(A^A r - /) □ z^ 1 )] + d 

6 zW = aw - boundo,oo[(^c A c - J) □ y (t_1) ] + d 

7 = (yW + ZW - aW -D) ■ S F 

8 end 

To produce a binary output, NetAlignBP ends by solving 
two maximum weight matching problem weighted by the 
messages y( 4 ' and j«, It then outputs the solution with the 
best objective. 

3.4. Improved BP 

Recall that Klau's ifTTI algorithm is obtained by refining 
the linear pro gram INALLPI using combinatorial properties 
of the problem. Similarly, we can modify the factor-graph 
representation of Section [XT1 to improve solutions of BP at 
the expense of increasing the running time. Here is a rough 
explanation of this modification. For each variable node ii' 
add function nodes dwj and dii',j' when ii'Djj'. These 
function nodes are defined by: 

du',j([x jk ']k',jk'au') = l {T, kltjhinul x jkl <i-} 

dii',j'([xkj'] k ,kfOii') = hi: hM ,nii>*kj><i}- 

These function nodes enforce the matching constraints at 
nodes j, j' among all edges that form a square with ii'. 

4. Synthetic experiments 

In this section we generate two types of synthetic net- 
work alignment data and compare our NetAlignBP algo- 
rithm with IsoRank and NetAlignMR. 

In the first class of data, we start by taking two copies 
of a k x k grid as A and B. Then for each vertex i 6 Va 
and its corresponding copy in Vb we add the edge ii' to L. 
We call these | | edges correct. Then for any two random 
vertex pairs i G Va, j' 6 Vb we add the edge if to L, 
independently, with probability p for some fixed constant 
< p < 1. 



expecied degree of n 



(a) Grid graphs (k = 20, q = 2, d = 1) 



MR-upper 

- * - MR 

-0- IsoRank 

) 5 10 

expecied degree of noise in L (p - n) 

(b) Power-law graphs (9 = 1.8, n = 400, q = 1) 



Figure 4. Upper bounds and solutions to synthetic problems on grid-graphs (a) and power-law graphs (b). Dark lines 
are values of the network alignment objective and light lines are the number of correct matches. The BP algorithms 
performs the best in terms of objective and correct matches. See Section|4]for more information. 



In order to make the model more realistic, we need to 
perturb graphs A, B by adding an edge between any two 
random vertices u, v £ Va ( u', v' £ Vb ), independently, 
with probability proportion to q/dA(u,v) 2 (q/ 'ds{u' \v') 2 ). 
Where da(u, v) is the distance between u, v in G. 

In these problems, the ideal alignment is known: the set 
of all correct edges ii'. In real- world networks, each cor- 
rect edge ii' may be confused with others edges jf where 
d,A(i,j) and dB(i',j') are small. To capture this aspect, we 
add some random edges to L within graph distance d of the 
end points of the ideal alignment. 

In the second class, we let A, B be random power-law 
degree sequence graphs with exponent 8 and n vertices. 
This means the fraction of vertices with degree k is propor- 
tion to k~ 6 . Similarly, we add correct and noisy edges to L 
and perturb graphs A, B. But we do not add the additional 
distance based noise to L. 

In our results, we compare the outputs of all algorithms 
to the correct matching edges ii' between the graphs A and 
B. Figure|4]shows the average fraction of the correct match- 
ing obtained by each algorithm over 10 trials. Here the ob- 
jective value of the network alignment is the total number 
of squares, and the dark lines in the figure show the ratio 
of the algorithm's objective to the objective value when the 
correct matching is used. 

Although the larger values for the objective tend to re- 
cover a higher fraction of the correct matching, the correct 
matching may not produce the best objective (highest num- 
ber of squares) when L is highly corrupted with a large 
number of edges. This explains a few cases where the frac- 
tion is more than 1 . Similarly, the light lines show the frac- 
tion of the correct matches that each algorithm has recov- 



Problem 


\Va\ 


\Ea\ 


\Vb\ 


\E B \ 


\E L \ 


dmela-scere 


9459 


25636 


5696 


31261 


34582 


Mus M.-Homo S. 


3247 


2793 


9695 


32890 


15810 


Icsh2wiki-small 


1919 


1565 


2000 


3904 


16952 


lcsh2wiki-full 


297266 


248230 


205948 


382353 


4971629 



Table 1 . Real dataset problem sizes. 



ered. These values track the objective values showing that 
the network alignment objective is a good surrogate for the 
number of correct matches objective. 

We see that BP and MR produce the best solutions. 
When the amount of random noise in L exceeds an expected 
degree (n ■ p) of 10 for the grid graphs and 8 for the power- 
law graphs, many of the algorithms are no longer able to 
obtain good solutions. In this regime, the BP algorithm per- 
forms significantly better than the MR algorithm. 

We used the BP algorithm with a = 1, j3 = 2, the Iso- 
Rank algorithm with 7 = 0.95, and the MR algorithm with 
a = 0, j3 = 1 for these experiments. We ran NetAlignMR 
for 1000 iterations, NetAlignBP for 100 iterations, and Iso- 
Rank for 100 iterations and took the best objective value 
from any iteration. 

5. Real datasets 

While we saw that the BP algorithm performed well on 
noisy synthetic problems in the previous section, in this sec- 
tion we investigate alignment problems from bioinformatics 
and ontology matching. Table Q] summarizes the properties 
of these problems. 



5.1. Bioinformatics 

The alignment of protein-protein interaction (PPI) net- 
works of different species is an important problem in bioin- 
formatics lPT8l . We consider aligning the PPI network 
of Drosophila melanogaster (fly) and Saccharomyces cere- 
visiae (yeast), and Homo sapiens (human) and Mus mus- 
culus (mouse). These PPI networks are available in sev- 
eral open databases and they are used in [19] and ifTTl . re- 
spectively. While the results of the experiment are rich in 
biological information, we focus our interest solely on the 
optimization problem. 

Figure 15.11 shows the performance of the three algo- 
rithms (NetAlignBP, NetAlignMR, SpalsoRank) on these 
two alignments. For each algorithm, we run it for 100 it- 
erations (SpalsoRank and NetAlignBP) or 500 iterations 
(NetAlignMR) with varied a and (3 parameters and various 
damping parameters and stepsizes. For each combination 
of parameters, we record the best iterate ever generated and 
plot the overlap and weight of the alignments in the figure. 

In both problems NetAlignBP and NetAlignMR man- 
age to obtain near-optimal solutions. NetAlignMR slightly 
outperforms NetAlignBP, and they both dominate IsoRank. 
NetAlignBP has a better running time instead. 

5.2. Ontology 

Our original motivation for investigating network 
alignment is aligning the network of subject headings 
from the Library of Congress with the categories from 
Wikipedia [22|. Each node in these networks has a small 
text label, and we use a Lucene search index [21] to quickly 
find potential matches between nodes based on the text. 
To score the matches, we use the SoftTF-IDF scoring rou- 
tine 0. Our real problem is to match the entire graphs. 
From this problem we extract a small instance that should 
capture the most important nodes in the problem. 2 The re- 
sults are shown in Figure lBTTI We leave detailed description 
of this data-set and implications of the network alignment 
to longer version of the paper. 

We saw similar behaviors as in Section lBTT] NetAlignMR 
and BP are close in overlap and they outperform IsoRank. 
Though not shown in the figure, BP obtains a lower bound 
of 16204 in lcsh2wiki-full with 7 = 0.9995, a = 0.2 and 
(3=1. Further experiments with Klau's LP relaxation 
solved to optimality (which means Ys = Yg) yields the 
lower bound 17302 and upper bound of 17571. 

In all our real datasets L is quite sparse, making Net- 
AlignMR more favorable. Still, BP nearly matches it and 
is faster. IsoRank is outperformed in these experiments, but 
it was originally designed for completely dense problems 
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Figure 5. Results of the three algorithms Net- 
AlignBP, SpalsoRank, and NetAlignMR on the 
dmela-scere alignment from [19| (top), and on the 
Mus M.-Homo S. alignment from [11] (bottom) 



and multi-way matching, i.e. between more than two PPI 
networks. 

6. Conclusions 

In all of our experiments NetAlignBP yields near- 
optimal solutions, and it significantly outperforms other al- 
gorithms when L is dense. The experiments also show that 
results of NetAlignBP and NetAlignMR are within 95% of 
the upper bound in large graphs with 5,000,000 potential 
edges and hundreds of thousands of vertices, which is very 
promising since the problem is APX-hard. Our study shows 
that the current network alignment algorithms are fast and 
good enough to handle large-scale real world problems. 

In the experiments, we observed that NetAlignMR in 
general takes a large number of iterations to produce a in- 
teger solution, while the upper bound converges much ear- 
lier. Therefore, in practice we suggest using NetAlignBP 
for finding the solution and NetAlignMR for upper bound 
when running time is a concern. 

An issue with all algorithms for this problem is that the 
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Figure 6. Results of the three algorithms Net- 
AlignBP, SpalsoRank, and NetAlignMR on 
lcsh2wiki-small (top), and on lcsh2wiki-full (bottom) 



performance is greatly affected by the choice of parameters. 
While the optimal choice of parameters depends on the ac- 
tual instance, we have some general suggestion for choosing 
the parameters: In NetAlignBP, the best outcome is usually 
obtained by setting 7 close to 1 and run for a large number 
of iterations. In NetAlignMR, setting a > greatly reduces 
the number of iterations to achieve a good overlap, but the 
best overlap is usually obtained with a = after a large 
number of iterations. In practice, all algorithms are fast so 
it is possible to try various combinations of parameters. 

When the graph L is dense, all these algorithms are im- 
practical for large graphs. NetAlignBP suffers from large 
storage demand (\Ea\ X \Eb\). NetAlignMR has similar 
storage limitations and becomes very slow since it is solv- 
ing a large number of MWM instances per iteration. When 
L is the complete bipartite graph, IsoRank does not need 
to form the matrix S explicitly, thus greatly saving on stor- 
age. One future direction will be finding an algorithm that 
performs well when L is dense; perhaps a combination of 
IsoRank and BP may suffice. 

NetAlignMR is based on (INATLPl i and the key idea is 
solving a sub matching problem for each row of S. It is a 



tempting idea to combine this technique with NetAlignBP. 
Furthermore, our experiments show that ( INATLPI ) has a 
small integrality gap in most cases. Finding another LP re- 
laxation with smaller integrality gap is an interesting prob- 
lem. 

Another open area is finding approximation algorithms 
for the network alignment problem (or to make it easier, the 
maximum common subgraph problem). We know that it is 
APX-hard from the reduction to max-cut, but it does not 
rule out constant approximation ratio. On the other hand, 
no non-trivial approximation is known yet, and we would 
like to see this gap closed. 
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7. Appendix 

7.1. Sparse IsoRank algorithm 

Recall that S was just A ® B when L is the complete bi- 
partite graph. By writing all the matrices W, V in a vector 
format w, v we obtain the SpalsoRank algorithm. 

SpalsoRank 



Input: S, L, damping parameter 7, niter, e 

1 v = w/(e T w) 

2 d=Sl, P^dmgid]-^ 

3 w(°) = V 

4 for k = 1 to niter unless S < e 

5 w ( fc ) = 7 P T w ( fe - 1 ) + (1 - 7)v 

6 S = || wW -w^-^H 

7 end 

7.2. Max-product belief propagation equa- 
tions 

Belief propagation algorithm (and its max-product ver- 
sion) is an iterative procedure for passing message along the 
edges of a factor graph |fl6l . We use notation t = 0, 1, . . . 
for discrete time-stamps. Applying this standard procedure 
to our factor-graph representation of Section [XT1 we obtain 
two types of real-valued BP messages: 1) From variable 
nodes to function nodes 2) From function nodes to variable 
nodes. The BP messages from the variable nodes ii' are 

(U'jj')eVs 

(5) 

( x w ) = i>}!_-«' ) II t Xu ' ) ' 

(ii'jj')eVs 

(6) 

n <L^<(^). <?> 

kk'^jj' 

Similarly, the BP messages from the variable nodes ii' jf 
satisfy 

v ii>ji'^h u , jjl i x ii'jj') = Y\. ^Ki'kk'^ii'n' ( Xii 'if)- 

(8) 

The opposite direction messages (from function nodes to 
variable nodes) are 

) = max e " eaf ' " V Xii ' fi ( x a h ) 
x a/i\{"'} 

n (*«'), (9) 

ij'£dfi\{ii'} 

i>^ i Axw)= max e° E "'««i'"*' x 'V(xa*,) 
x e Si ,\{u>} 

n v vU„M's\ d°) 

i'j£dgi'\{ii'} 



|http : / /en . wikipedia .org/wiki/Wikipedia : Database_download| 



t'h —it' s **') = max e^ /2x «'«'/^jy(xgfo 4 ., ,) 

x ji'< x u'ji' 



^ ni'jj'( x O'n') = max ^^"'"'ki'ij'fahtw) 

" 33 JJ X it f ,Xjjl JJ 

4U w (^)«'i5L w («iiO- (12) 

At the end of each iteration t, each variable node x^i 
(xu'jj') is assigned a binary value as follows: 

-(*) - 

arg max vf^ u , {x iv (x«/ ) JJ fljjj,^ _ w , (z* 

\ («'jj')ev s 



In many applications as t — > oo the assigned values 
x|*/ , x|*/ converge to good approximate solution. These 
standard BP equations have redundancies and in practice, 
one can obtain a more simple but equivalent vejj'ion of the 
algorithm. Next, we explain this simplification. 

Simplified BP equations. Now we simplify the above 
set of equations in a few steps: 1) Eliminate all messages 
from the function nodes (v) by substituting them in ©- 
© with the right hand sides of ©-([T2]i. 2) Since the 
variables xu> and xu>jj> are binary valued, compress the 
messages further by sending just the log-likelihood values 

*4L fi = Iog(4L A (l)/4w«(0))- Similarly define 

(*) (*) A (*) 1\ 

messages m^.,, m^^, . ., , and m^^^,.,,. 3) 

Eliminate the messages of the form mi'L, from the 

equations. 

Lemma 4. The max-product equations (I5ll- (I121 > are equiv- 
alent to the simplified BP equations (O-©. 



